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0\ . Abstract 



Oh; 

\ Given a list of states with probabilities < pi < • • • < pN, the average 

conditional algorithmic information / to specify one of these states obeys the 
inequality H < I < H + 0(1), where H = —J2Pj^'^S2Pj ^'^d 0(1) is a 
computer-dependent constant. We show how any universal computer can be 
, slightly modified in such a way that the inequality becomes H < I < H + 1, 

fSJ ' thereby eliminating the computer-dependent constant from statistical physics. 

o ■ 
o 



I. INTRODUCTION 



Algorithmic information theory in combination with Landauer's principle [^0, 

which specifies the unavoidable energy cost /c^T In 2 for the erasure of a bit of information 
^1 in the presence of a heat reservoir at temperature T, has been applied successfully to a range 
>-Ch ■ of problems: the Maxwell demon paradox a consistent Bayesian approach to statistical 
mechanics [0-[T^, a treatment of irreversibility in classical Hamiltonian chaotic systems 
^ ! | TO| , p!I |, and a characterization of quantum chaos relevant to statistical physics [llO|, T2| , p!3 



^ I The algorithmic information for a physical state is defined as the length in bits of the shortest 
self-delimiting program for a universal computer that generates a description of that state 



IpIJT^. Algorithmic information with respect to two different universal computers differs at 
most by a computer-dependent constant p. Although typically the latter can be neglected 
in the context of statistical physics, the presence of an arbitrary constant in a physical 



theory is unsatisfactory and has led to criticism . In the present paper, we show how the 



computer-dependent constant can be eliminated from statistical physics. 

In the following paragraphs we give a simplified account of the role of algorithmic infor- 
mation in classical statistical physics. A more complete exposition including the quantum 
case can be found in Refs. |]8|,[T0[|. We adopt here the information-theoretic approach to sta- 
tistical physics pioneered by Jaynes |T^. In this approach, the state of a system represents 
the observer's knowledge of the way the system was prepared. States are described by prob- 
ability densities in phase space; observers with different knowledge assign different states to 
the system. Entropy measures the information missing toward a complete specification of 
the system. 
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Consider a set of states (A^ > 2) labeled by j = 1, . . . , A^, all having the same energy 
and entropy. The restriction to states of the same energy and entropy is not essential, 
but it simplifies the notation. Initially the system is assumed to be in a state in which 
state j is occupied with probability pj > 0. We assume throughout that the states j 
are labeled such that < pi < ■ ■ ■ < pn- If an observation reveals that the system is 
in state j, the increased knowledge is reflected in an entropy decrease AS = —kBlti2 H 
where H = —Y^pj logg pj > is the original missing information measured in bits. To 
make the connection with thermodynamics, we assume that there is a heat reservoir at 
temperature T to which all energy in the form of heat must eventually be transferred, 
possibly using intermediate steps such as storage at some lower temperature. In the presence 
of this fiducial heat reservoir, the entropy decrease AS* corresponds to a free energy increase 
AF = —TAS = +kBT\n2 H . Each bit of missing information decreases the free energy by 
the amount kBThi2; if information is acquired about the system, free energy increases. 

The fact that entropy can decrease through observation — which underlies most proposals 
for a Maxwell demon — does not conflict with the second law of thermodynamics because 
the observer's physical state changes as a consequence of his interaction with the system. 
Szilard fl^ discovered that no matter how complicated is the change in the observer's 
physical state, the associated irreducible thermodynamic cost can be described solely in 
terms of information. He found that in the presence of a heat reservoir at temperature T 
each bit of information acquired by the observer has an energy cost at least as big as fc^T In 2. 
Total available work is reduced not only by missing information, but also by information 
the observer has acquired about the system. The physical nature of the cost of information 
was clarified by Bennett 0, who applied Landauer's principle to the Maxwell demon 
problem and showed that the energy cost has to be paid when information is erased. 

To keep the Landauer erasure cost of the observational record as low as possible, the 
information should be stored in maximally compressed form. The concept of a maximally 
compressed record is formalized in algorithmic information theory ||^. Bennett [^] and Zurek 
I^J^ gave Szilard's theory its present form by using algorithmic information to quantify the 
amount of information in an observational record. In particular, by exploiting Bennett's idea 
of a reversible computer 0, Zurek |^ showed how an observational record can be replaced 
by a compressed form at no thermodynamic cost. This means that the energy cost of the 
observational record can be reduced to the Landauer erasure cost of the compressed form. 

Let us denote by Sj a binary string describing the jth state (j = 1, . . . , A^). A detailed 
discussion of how a description of a physical state can be encoded in a binary string is 
given in |Q. The exact form of the strings Sj is of no importance for the theory outlined 
here, however, because the information needed to generate a list of all the strings Sj can 
be treated as background information |1T0|JT^ . Background information is the information 
needed to generate a hst s = {{si,pi), . . . , {sn,Pn)) of all A^ states together with their 
probabilities; i.e., background information is the information the observer has before the 
observation. 

Algorithmic information is defined with respect to a specific universal computer U. We 
denote by Iu{sj\s) the conditional algorithmic information, with respect to the universal 
computer U, to specify the jth state, given the background information Pflp!^. More 
precisely, Iu{sj\s) is the length in bits of the shortest self-delimiting program for U that 
generates the string sj, given a minimal self-delimiting program to generate s. For a formal 
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definition of a universal computer U and of Iu{sj\s) see Sec. ||. It should be emphasized 
that a minimal program that generates the list s of descriptions of all states and their 
probabilities can be short even when a minimal program that generates the description Sj 
of a typical single state is very long [Q. 

Since total available work is reduced by ksT ln2 by each bit of information the observer 
acquires about the system as well as by each bit of missing information, the change in total 
free energy or available work upon observing state j can now be written as 

AF,-tot = -T [AS + ks ln2Iuisj\s)] = -ksT ln2 [-H + Iuisj\s)] . (1) 

This definition of total free energy is closely related to Zurek's definition of physical entropy 
P). Average conditional algorithmic information Iu{-\s) = Y.PjIuisj\s) obeys the double 



inequality [H,14 



H < Iu{-\s) < H + 0{1) , (2) 

where 0(1) denotes a positive computer-dependent constant [^. It follows immediately that 
the average change in total free energy, AFtot = J2Pj^Fj^tot, is zero or negative: 

> AFtot > -0(l)A;BTln2 . (3) 

The left side of this double inequality establishes that acquiring information cannot increase 
available work on the average. For standard choices for the universal computer U, e.g., a 
Turing machine or Chaitin's LISP-based universal computer [Q, the computer-dependent 
0(1) constant on the right is completely negligible in comparison with thermodynamic en- 
tropies. Equation (H) therefore expresses that on the average, with respect to a standard 
universal computer, total free energy remains essentially unchanged upon observation. De- 
spite the success of this theory, the presence of an arbitrary constant is disturbing. To 
understand the issues involved in removing the arbitrary constant, we must introduce the 
notions of simple and complex states. 

Although the average information Iu{-\s) is greater than or equal to H, there is a class 
of low-entropy states that can be prepared without gathering a large amount of information. 
For example, in order to compress a gas into a fraction of its original volume, free energy has 
to be spent, but the length in bits of written instructions to prepare the compressed state is 
negligible on the scale of thermodynamic entropies. States that can be prepared reliably in 
a laboratory experiment usually are simple states, which means that there is a short verbal 
description of how to prepare such a state. 

The concept of a simple state is formalized in algorithmic information theory. A simple 
state is defined as a state for which Iu{sj\s) -C H; i.e., descriptions for simple states can be 
generated by short programs. The total free energy increases, in the sense of Eq. (|lD, upon 
observing the system to be in a simple state. Simplicity is a computer-dependent concept. 
Standard universal computers like Turing machines refiect our intuitive notion of simplicity. 
It is easy, however, to define a universal computer for which there are no short programs at 
all; such a computer would not recognize simplicity. 

Intuitively, simplicity ought to be an intrinsic property of a state. A computer formalizing 
the intuitive concept of simplicity should refiect this. In particular, for such a computer 
a simple state should have a short program independent of the probability distribution 
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Pi, . . . ,Pn- This is not true for all universal computers. In Sec. || we introduce a universal 
computer for which Ii/^{sj\s) is determined solely by the probabilities pi, ■ ■ ■ ,Pn- For this 
computer, a short program for the jth state reflects a large probability pj, not an intrinsic 
property of the state. We will say that such a computer does not recognize intrinsically 
simple states. 

Simple states are rare — there are fewer than 2"^ states j for which Iu{sj\s) < n [^] — and 
thus arise rarely as the result of an observation, yet they are of great importance. Simple 
states are states for which the algorithmic contribution to total free energy is negligible. The 
concept of total free energy does not conflict with conventional thermodynamics because 
thermodynamic states are simple. If the theory does not have the notion of simple states, 
the connection with conventional thermodynamics is lost. 

The opposite of a simple state, a complex state, is defined as a state for which Iu{sj\s) 
is of the same order as H. Complex states arise not just through Maxwell demon-like 
observations. We have shown |T^-|T^ that initially simple states of chaotic Hamiltonian 
systems in the presence of a perturbing environment rapidly evolve into extremely complex 
states 1^,0] for which the negative algorithmic contribution to total free energy is vastly 
bigger than H and thus totally dominates conventional free energy. In addition to giving 
insight into the second law of thermodynamics, this result leads to a new approach to 
quantum chaos 1 10,12,13]. 

In this paper, we show how the computer-dependent 0(1) constant can be eliminated 
from the theory summarized above. In Sec. H we construct an optimal universal computer 
for which the 0(1) constant is minimal. It turns out, however, that optimal universal com- 
puters do not recognize intrinsically simple states and thus are unsatisfactory in formulating 
the theory. This difficulty is solved in Sec. |T| where we show that any universal computer U 
can be modified in a simple way such that (a) any state that is simple with respect to U is 
also simple with respect to the modified universal computer and (b) average conditional 
information with respect to exceeds average conditional information with respect to an 
optimal universal computer by at most 0.5 bits. Moreover, conditional algorithmic informa- 
tion with respect to the modified computer f/3 obeys the inequality H < Iu.j^[-\s) < H + 1. 
This double bound is the tightest possible in the sense that there is no tighter bound that 
is independent of the probabilities pj. 



II. AN OPTIMAL UNIVERSAL COMPUTER 

The idea of an optimal universal computer is motivated by Zurek's discussion Q of 
Huffman coding []TB[ as an alternative way to quantify the information in an observational 



record. We consider only binary codes, for which the code words are binary strings. Before 
reviewing Huffman coding, we need to formalize the concept of a list consisting of descriptions 
of states together with their probabilities. 

Definition 1: A list of states s is a string of the form s = ((si,pi), . . . , {sn,Pn)) where 
> 2, < pi < . . . < pn, J2Pj = 1; and Sj is a binary string (j = 1, . . . , A^). More precisely, 
the list of states s is the binary string obtained from the hst ((si,pi), . . . , {sn,pn)) by some 
definite translation scheme. One possible translation scheme is to represent parentheses, 
commas, and numbers (i.e., the probabilities pj) in ascii code, and to precede each binary 
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string Sj by a number giving its length \sj\ in bits. The entropy of a hst of states is 
H{s) = —J2Pj^og2Pj- Throughout this paper, |t| denotes the length of the binary string t. 

The Huffman code for a list of states s = ((si,pi), . . . , {sn,Pn)) is a prefix-free or instan- 



taneous code |T9[ — i.e., no code word is a prefix of any other code word — and can, like all 
prefix-free codes, be represented by a binary tree as shown in Fig. |I|. The number of links 
leading from the root of the tree to a node is called the level of that node. If the level-n 
node a is connected to the level-(n + 1) nodes b and c, then a is called the parent of b and 
c; a's children b and c are called siblings. There are exactly terminal nodes or leaves, 
each leaf corresponding to a state j. Each link connecting two nodes is labeled or 1. The 
sequence of labels encountered on the path from the root to a leaf is the code word assigned 
to the corresponding state. The code-word length of a state is thus equal to the level of 
the corresponding leaf. Each node is assigned a probability qk such that the probability of 
a leaf is equal to the probability pj of the corresponding state and the probability of each 
non-terminal node is equal to the sum of the probabilities of its children. 



A binary tree represents a Huffman code if and only if it has the sibling property 
i.e., if and only if each node except the root has a sibling, and the nodes can be listed in 
order of nonincreasing probability with each node being adjacent to its sibling in the list. 
The tree corresponding to a Huffman code and thus the Huffman code itself can be built 
recursively. Create a list of N nodes corresponding to the N states. These N nodes will be 
the leaves of the tree that will now be constructed. Repeat the following procedure until 
the tree is complete: Take two nodes with smallest probabilities, and make them siblings by 
generating a node that is their common parent; replace in the list the two nodes by their 
parent; label the two links branching from the new parent node by and 1. 

The procedure outlined above does not define a unique Huffman code for the list of 
states s, nor does it give generally a unique set of code-word lengths. In the following, we 
will assume that we are given some definite algorithm to assign a Huffman code where the 
freedom in the coding procedure is used to assign to the first state (the one with smallest 
probability) a code word of maximum length consisting only of zeros. 

Definition 2: Given a list of states s = {{si,pi), . . . , {sn,Pn)), the binary string Cj{s) 
with length lj{s) = \cj{s)\ denotes the Huffman code word assigned to the jth state using a 
definite algorithm with the property that Ci(s) = . . . and lj{s) < li{s) for j = 2, . . . , N. 
We denote the average Huffman code-word length by l{s) = J2Pj^j{s)- The redundancy r{s) 
of the Huffman code is defined by r(s) = l{s) — H{s). 

The redundancy r{s) obeys the bounds < r(s) < 1, corresponding to bounds 

H{s) < l{s) < H{s) + 1 (4) 

for the average code-word length. Huffman coding is optimal in the sense that there is 
no prefix-free binary code with an average code- word length less than l[s). There can be, 
however, optimal prefix-free codes that are not Huffman codes. 

The length lj{s) of the Huffman code word Cj{s) cannot be determined from the proba- 
bility Pj alone, but depends on the entire set of probabilities pi, . . . ,Pn- The tightest general 
bounds for lj{s) are 

l<lj{s)<-\og„p, + l, (5) 
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where g = (-\/5 + l)/2 is the golden mean. The code-word length for some states j thus can 
differ widely from the value — loggPj. For most states j, however, the Huffman code- word 



length is lj{s) ^ — loggPj. The following theorem |^ is a precise version of this statement. 



Theorem 1: (a): P~ = J2j^i-Pj < 2"™ where = k{s) < — log2Pj — m}, i.e., 
the probability that a state with probability p has Huffman code-word length smaller than 

— log2P — m is less than 2~™. (This is true for any prefix-free code.) (b): P+ = Pj < 
2-c(m-2)+2 ^i^Qj-Q /+ = 1^ I > - \0g2Pi + m} and c = (1 - loggSf)"-^ - 1 ~ 2.27, i.e., 
the probability that a state with probability p has Huffman code-word length greater than 

— \0g2P + m is less than 2~'=(™"2)+2. 



Proof: See [HI . □ 



Suppose that one characterizes the information content of a state j by its Huffman code- 
word length lj{s). Then in Eq. average algorithmic information Iu{-\s) is replaced by 
average code-word length l{s), the 0(1) constant is replaced by 1, and Eq. (|) assumes the 
concise form > AFtot > —ksT ln2. This way of eliminating the 0(1) constant, how- 
ever, has a high price. Since Huffman code-word lengths depend solely on the probabilities 
Pi, . . . ,Pn — states with high probability are assigned shorter code words than states with 
low probability — Huffman coding does not recognize intrinsically simple states. This means 
that one of the most appealing features of the theory is lost, namely that the Landauer 
erasure cost associated with states that can be prepared in a laboratory is negligible. 

In the present article, we show that it is possible to retain this feature of the theory, yet 
still eliminate the computer-dependent constant. We first attempt to do this by constructing 
an optimal universal computer, i.e., a universal computer for which the 0(1) constant in 
Eq. (0) is minimal. We find, however, that optimal universal computers do not recognize 



intrinsically simple states, either. A solution to this problem will be given in Sec. |T| where 
we discuss a class of nearly optimal universal computers. 

We will need precise definitions of a computer and a universal computer, which we quote 
from Chapter 6.2 in 0. 

Definition 3: A computer O is a computable partial function that carries a program string 
p and a free data string q into an output string C{p,q) with the property that for each q 
the domain of C{.,q) is a prefix-free set; i.e., if C{p,q) is defined and p is a proper prefix 
of p', then C{p',q) is not defined. In other words, programs must be self-delimiting, f/ is a 
universal computer if and only if for each computer C there is a constant sim(C') with the 
following property: if C{p,q) is defined, then there is a p' such that U{p',q) = C{p,q) and 
\p'\ < \p\ + sim(O). 

In this definition, all strings are binary strings, and \p\ denotes the length of the string p 
as before. The self-delimiting or prefix-free property entails that for each free data string q, 
the set of all valid program strings can be represented by a binary tree. 

For any binary string t we denote by t*{U) (or just t* if no confusion is possible) the 
shortest string for which U{t*,A) = t where A is the empty string; i.e., t* is the shortest 
program for the universal computer U to calculate t. If there are several such programs, 
we pick the one that is first in lexicographic order. This allows us to define conditional 
algorithmic information. 
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Definition 4: The conditional algorithmic information Iu{ti\t2) to specify the binary string 
^1, given the binary string t2, is 

luitM = min IpI . (6) 

p\U(p,t*)=ti 

In words, Iuiti\t2) is the length of a shortest program for U that computes ti in the presence 
of the free data string In particular, the conditional algorithmic information Iu(sj\s) to 
specify the jth state, given a list of states s = ((si,pi), . . . , {sn,Pn)), is 

Msj\s) = min IpI . (7) 

p\U{p,s*)=Sj 



The average of Iu(sj\s) is denoted by Iu{-\s) — J2PjIu{sj\s). 

The next theorem puts a lower bound on the average information. 

Theorem 2: For any universal computer U and any list of states s = ((si,Pi), • • • , {sn,Pn)), 
the average conditional algorithmic information obeys the bound 



Iu(-\s)>H(s)+r(s)+pi. (8) 



Proof: We denote by s'j a shortest string for which U{s'j,s*) = Sj. The strings s'j 
form a prefix-free code. If the N strings s'j are represented by the leaves of a binary tree, 
then there is at least one node that has no sibling. Otherwise U{p,s*) would be defined 
only for a finite number N of programs p, and U would not be a universal computer. Let us 
denote by Q a sibling-free node and by q its probability (q > pi). Then a shorter prefix-free 
code {s'j} can be obtained by moving node Q down one level. More precisely, for states j 
corresponding to leaves of the subtree branching from node Q, s'- is obtained from s'^ by 
removing the digit corresponding to the link between node Q and its parent; for all other 
states j, Sj — s'j. The code- word lengths of the new code are |Sj'| = — 1 if state j is a 
leaf of the subtree branching from node Q and \s"\ — \s'j\ otherwise. Since the new code is 
prefix-free, its average code-word length is greater than or equal to the Huffman code-word 
length l{s). It follows that 

M-k) =EPil4l = 12PjWj\+^^ Ks) +Pi = H{s) + r{s) +pi , (9) 
j j 

which proves the theorem. □ 

We can now proceed to define an optimal universal computer. 

Definition 5: U is an optimal universal computer if there is a constant e > such that for 
all lists of states s = . . . , {sn,Pn)) with pi > e the average conditional algorithmic 

information has its minimum value 

k;{^^H{s) + r{s)+p, . (10) 



Theorem 3: For any e > there is an optimal universal computer U^. 
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Proof: Let U be an arbitrary universal computer and e > 0. For any list of states 
-5 = ((si, ),•••, (sat, Pat)) with pi > ewe define c[{s) = Ci(s) o 1 = . . . 01 and c'j{s) = Cj{s) 
for j = 2, . . . , N where o denotes concatenation of strings. The strings c^ (s) thus differ from 
the Huffman code Cj{s) in that a 1 has been appended to the code word for the state j = 1. 
According to Eq. (||), + 1 < Ai'o = [— log^ e + 2J , where (7 = (\/5 + 1)/2 and [xj denotes 
the largest integer less than or equal to x. We denote by ctq a string composed of Nq zeros; 
none of the strings Cj(s) is longer than (Jq. 

For the definition of U^{p,q) we distinguish two cases. If the binary string q is of the 
form 

q = cro°qs with U (g^, A) = s (11) 
for some list of states s = . . . , {sn,Pn)) with pi > e, then Ue{p, q) is defined for 

peD{q) = {0-0 o p' I U{p', q) is defined} U {c^.(s) | 1 < j < A^} , (12) 

with 

Ue{o-Q op',q) = U{p',q) whenever U{p',q) is defined (13) 

and 

U,{c'^{s),q) = s, for j = l,...,N . (14) 
If the binary string q is not of the form (0), then Ue{p, q) is defined for 

p G D{q) = {(To op' I U{p',q) is defined} , (15) 

with 

f/e(cro o p', q) = U{p', q) whenever U{p' , q) is defined . (16) 

In both cases, the set D{q), which is the domain of Ue{-,q), is clearly prefix-free. Moreover, 
since U^{ao op,q) = U{p,q) whenever U{p,q) is defined and f/ is a universal computer, 
is also a universal computer, with the simulation constant sim(C) increased by A"o. 

For any string t the minimal program on — i.e., the shortest program given an empty 
free data string — is t*{Ue) = cxo o t*{U), where t*{U) is the minimal program for t on U. 
In particular, the shortest program for to compute s is s*{Ue) = ao o s*{U). Since 
Ue{c'j{s), s*{Ue)) = Sj and |c^(s)| < A'o for j = 1, . . . , A^ while \p\ > Nq for all other programs 
p G D{s*{U^)), it follows immediately that 

IuAsj\s) = |c;.(s)| = \cj{s)\ + 6ij = lj{s) + 6ij (17) 

and thus that 

= J2Pj^uAsj\s) = J2pM^^^\ = Y.Pj\^ji^)\ = ^(^) = H{s) + r{s) +pi . 

(18) 

□ 
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If U{qs, A) = Ue{<7o o gg, A) = s, i.e., if is a program for U generating a list of states s, 
the programs p for which U^{p, ctq o (Js) is defined can be represented by a binary tree similar 
to Fig. ^ With respect to the binary tree representing the Huffman code (Fig. |1|), the leaf 
for the j = 1 state has been moved up one level to make room for the new node labeled by 
U. This new node leads to a subtree representing all programs p' for which U{p', ctq o q^) is 
defined. 

The operation of the optimal universal computer Ue can be described in the following 
way. When reads a string that begins with A^^o zeros from its program tape, disregards 
the A'o zeros and interprets the rest of the string as a program for the universal computer 
U, executing it accordingly. If encounters the digit 1 while reading the first A"o digits 
from its program tape, 11^ interrupts reading from the program tape, reads in the free 
data string, and executes it. If the result of executing the free data string is a list of states 
■s = (("Si,Pi), . . . , {s]\r,p]y)), establishes the modified Huffman code {c'j{s)} for s, continues 
reading digits from the program tape until the string read matches one of the code words, 
say c^o(s), and then prints the string Sj^. The output of is undefined in all other cases. 



Since r(s)+pi < 1 H(s) < Iu{-\s) < H{s) + 1 for any optimal universal computer U. 
For the particular optimal universal computer defined in the proof of theorem 3, however, 
the information Iu^{sj\s) is completely determined by the Huffman code- word length for 
the jth state and therefore is completely determined by the probabilities pi, ■ ■ ■ ,Pn- This 
optimal universal computer does not recognize intrinsically simple states. As an aside, note 
that Ue cannot give a short description of the background information for any probability 
distribution, because a minimal program for computing the list of states s on f/^ must begin 
with A^o zeros. It turns out that all optimal universal computers, not just f/^, are unable to 
recognize intrinsically simple states. The following theorem formulates this inability for all 
optimal universal computers in a slightly weaker form than holds for U^. As a consequence, 
the use of algorithmic information with respect to an optimal universal computer to quantify 
the information in an observational record presents no advantage over the use of Huffman 
coding. 

Theorem 4: For any optimal universal computer U and any list of states s = 
. . . , {sn,Pn)) for which Iu{-\s) = H{s) + r(s) + pi, the following holds: If pi > pj, 
then Iu{si\s) < Iu{sj\s). Optimal universal computers therefore do not recognize intrinsi- 
cally simple states. 



Proof: To prove the theorem, we show that Iu{-\s) > H{s) + r(s) + pi for any universal 
computer U and any hst of states s = {{si,pi), . . . , {sn,Pn)) for which there are indices 
i and j such that pi > pj but Iu{si\s) > Iu{sj\s). We denote by s'j a shortest string for 
which U{s'j, s*) = Sj. The strings s'j form a prefix-free code. Following an argument similar 
to the proof of theorem 2, we can shorten that code on the average by moving a sibling- 
free node one level down and in addition by interchanging the code words for states i and 
j. The resulting shorter code must obey the Huffman bound, from which the inequality 
Iu{-\s) > l{s) +pi = H{s) +r{s) +pi follows. □ 
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III. PRESERVING SIMPLE STATES BY GIVING UP 1/2 BIT 



Although the discussion in the last section shows that optimal universal computers 
present no advantages over Huffman coding, the main idea behind their construction can be 
further exploited. If the subtree representing the programs for the universal computer U is 
not attached next to the j = 1 leaf as in Fig. ^, but instead is attached close to the root as 
in Fig. ^, the resulting universal computer f/3 combines the desirable properties of Huffman 
coding and the computer U. This is the content of the following theorem. 

Theorem 5: For any universal computer U there is a universal computer f/3 such that 

luMh) < Iu{h\t2) + 3 (19) 
for all binary strings ti and t2, and that 

H{s) < l^JJ^ < H{s) + 1 (20) 

and 

7^;^ < i/(.) + r(.) + 1 (21) 

for all lists of states s = ((si,pi), . . . , {siy,piy)). 

Proof: Let U be an arbitrary universal computer. For any list of states s = 
((si,pi), . . . , (s7v,p7v)) we define the set of strings c'j{s) as follows. We start from the binary 
tree formed by the Huffman code words Cj{s) where we denote by qi the probability of the 
level-1 node connected to the root by the link labeled (see Fig. |l]). According to the value 
of qi, we distinguish two cases. In the case qi < 1/2, c'j{s) = 01 o c^(s) if Cj{s) is of the 
form Cj{s) = o c^(s), and c'j{s) = Cj{s) if Cj{s) is of the form Cj{s) = 1 o c^(s). In the case 
qi > 1/2, c'j{s) = 01 o Cj'{s) if Cj{s) is of the form Cj{s) = 1 o c^(s), and c'j{s) = 1 o c^(s) if 
Cj{s) is of the form Cj{s) = o c^(s). 

Figure ^ illustrates the binary tree formed by the code words c'j{s) for the case qi < 1/2. 
Of the two main subtrees emerging from the level-1 nodes in Fig. |T], the subtree having 
smaller probability is moved up one link and attached to the node labeled 01, and the 
subtree having larger probability is attached to the node labeled 1. In this way, the node 
labeled 00 is freed for the subtrees representing the valid programs for U. 

For the definition of Us{p,q) we distinguish three cases. If the binary string q is of the 
form 

g = 000 o q, with f/(g„ A) = s (22) 
for some list of states s = {{si,pi), . . . , {siy,piy)), then U^{p,q) is defined for 
p e D{q) = 

{000 o p' I U{p', q) is defined} U {001 o p' \ U{p', g,) is defined} U {c^.(s) | 1 < j < A^} , (23) 
with 

1/3(000 o p', q) = U{p', q) whenever U{p\ q) is defined , (24) 
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Us{001 o p', q) = U{p' , qs) whenever U{p', qs) is defined , (25) 

and 

U,{c'^{s),q)=s,hTj = l,...,N . (26) 
If tlie binary string q is of tlie form 

g = 000 o g' , (27) 
but there is no hst of states s such that U{q', A) = s, then Us{p, q) is defined for 

peD{q) = {000 o p' I U{p', q) is defined} U {001 o p' \ U{p, q') is defined} , (28) 

with 

^73(000 op', q) = U{p', q) whenever t/(p', q) is defined (29) 

and 

[73(001 o p\ q) = U(p', q') whenever U{p', q') is defined . (30) 
Finally, if q is not of the form (^), then U^{p^ q) is defined for 

p e D{q) = {000 o p' I U{p, q) is defined} , (31) 

with 

[73(000 o p\ q) = U{p' , q) whenever U {p , q) is defined . (32) 

In all three cases, the set D{q), which is the domain of U^{-,q), is clearly prefix-free. 
Moreover, since [73(000 op,q) = U{p,q) whenever U{p,q) is defined and [/ is a universal 
computer, U3 is a also a universal computer, with the simulation constant sim(C) increased 
by 3. Equation (|19|) holds because of the following. The minimal program for ^2 on U3 in the 
presence of an empty free data string is ^2(^3) = 000 ° '^2(^) since U3{p,A) is defined only 
if p = 000 op' and U{p',A) is defined, in which case [73(p, A) = U{p',A). If p is a minimal 
program for ti on [7 in the presence of the minimal program for t2, i.e., if 

U{p,t;{U))=h, \p\ = Iuih\t2), (33) 

then 

[73(001 op,t;([73)) = [73(001 op, 000 ot;([7)) = [7(p,t;([7)) =ti (34) 
and therefore 

/c/3(^i|i^2) < lOOlopI = |p| + 3. (35) 

The strings c'j{s) form a prefix-free code with an unused code word of length 2, for which 
J2PjWj{s)\ < H{s) + 1 according to theorem 3 in [^. (In the inequality appears with 
a < sign, but equality can occur only if the smallest probability pi is equal to zero, a case 
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we have excluded.) The shortest program for U3 to compute s is s*(f/3) = 000 o s*{U), 
where s*{U) is the shortest program for U to compute s. Since [/^{c'j^s), s*^!/^)) = Sj for 
j = 1, . . . , A^, it follows immediately that Iu^{sj\s) < \c'j{s)\ and thus that 

= J2PMis,\s) < < His) + 1, (36) 

which establishes the upper bound in Eq. (0). The lower bound in Eq. (^) holds for all 
universal computers. Equation ( pT]) follows from 

^Pj\c'j{s)\ = J2Pj\'^j(^)\ + min(gi, 1 - q^) = l{s) +min(gi, 1 - gi) < H{s) + r{s) + 1/2 . 

(37) 

□ 

If U{qs,A) = [73(000 o qs,A) = s, i.e., if is a program for U generating a list of states 
s, the programs p for which ?73(p, 000 o q^) is defined can be represented by a binary tree 
similar to Fig. ^ The level-3 node labeled U is the root of a subtree corresponding to the 
programs p' for which U{p', 000 o q^) is defined, and the level-3 node labeled U' is the root 
of a subtree corresponding to the programs p' for which U{p', g^) is defined. 

The operation of the universal computer can be described in the following way. When 
Us reads a string that begins with the prefix 000 from its program tape, U3 disregards the 
prefix and interprets the rest of the string as a program for the universal computer U, 
executing it accordingly. When f/3 reads a string that begins with the prefix 001 from its 
program tape, the output is only defined if the free data string begins with 000, in which 
case U3 disregards the first 3 digits of the program and free data strings and interprets the 
rest of the strings as program and free data strings for the universal computer U, executing 
it accordingly. If U3 encounters the digit 1 while reading the first two digits from its program 
tape, f/3 interrupts reading from the program tape, reads in the free data string, and executes 
it. If the result of executing the free data string is a list of states s = ((si,pi), . . . , (stvjPat)), 
U3 establishes the modified Huffman code {c'j{s)} for s, continues reading digits from the 
program tape until the string read matches one of the code words, say c'j^{s), and then prints 
the string Sj^. The output of f/3 is undefined in all other cases. 

The computer f/3 compromises between the desirable properties of algorithmic informa- 
tion and Huffman coding. Since algorithmic information defined with respect to f/3 exceeds 
algorithmic information relative to U by at most 3 bits, states that are simple with respect 
to U are simple with respect to f/3. Those 3 bits are the price to pay for a small upper bound 
on average information. The average conditional algorithmic information Iu^{-\s) obeys the 
close double bound Eq. (|20|) and exceeds the Huffman bound l{s) by at most 0.5 bits. This 
half bit is the price to pay for the recognition of intrinsically simple states. 

IV. CONCLUSION 

We have shown that any universal computer U can be modified in such a way that (i) 
the modified universal computer f/3 recognizes the same intrinsically simple states as U and 
(ii) average algorithmic information with respect to f/3 obeys the same close double bound 
as Huffman coding, H{s) < lu^i'ls) < H{s) + 1. If for any choice of a universal computer U, 
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total free energy is defined witli respect to the corresponding modified universal computer 
U3, i.e., if the change of total free energy due to finding the system in the jth state is 
^Fj,tot = —kBT\n2 [—H{s) + Iu^{sj\s)], then the bounds for the average change in total free 
energy are given by 

> AFtot > -A;Brin2 (38) 

instead of by Eq. @ . 

This result effectively eliminates the undetermined computer-dependent constant from 
applications of algorithmic information theory to statistical physics. Except for an unavoid- 
able loss due to the coding bounded by In 2, on the average available work is independent 
of the information the observer has acquired about the system, any decrease of the statistical 
entropy being balanced by an equal increase in algorithmic information. 
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FIGURES 



FIG. 1. Binary tree representing the Huffman code for 6 states with probabihties pi, . . . ,pq. 
The node probabilities are defined recursively, i.e., qj = pi, qs = P2, Qs = Qj + Qs^ etc. Code 
words correspond to branch labels; e.g., the code word for the third state (probability p^) is 110. 

FIG. 2. Binary tree representing all valid programs for the optimal universal computer Uf_ in 
the presence of a free data string generating a list of states ((si,pi), . . . , {sq,pq)). With respect to 
the tree in Fig. [l|, the node labeled 57 = pi has been moved up one level to make room for the 
subtree representing programs for U. 

FIG. 3. Binary tree representing all valid programs for the universal computer U3 in the pres- 
ence of a free data string generating a list of states s = {{si,pi), . . . , {sq,pq)). With respect to 
the tree in Fig. ||, the level-1 node labeled qi has been moved up one level to make room for the 
subtrees representing programs for U. More precisely, the binary tree represents the programs p 
for which Us{p, OOOoq^) is defined if C/3(000oq^, A) = s. The node labeled U is the root of a subtree 
corresponding to the programs p' for which [/(p',000 o q^) is defined, and the node labeled U' is 
the root of a subtree corresponding to the programs p' for which U{p', qs) is defined. 



15 




root 



Figure 1 (Schack) 



This figure "figl-l.png" is available in "png" format from: 



http://arXiv.org/ps/hep-th/9409022vl 







root 



Figure 2 (Schack) 



This figure "figl-2.png" is available in "png" format from: 



http://arXiv.org/ps/hep-th/9409022vl 




root 



Figure 3 (Schack) 



This figure "figl-3.png" is available in "png" format from: 



http://arXiv.org/ps/hep-th/9409022vl 



